An Optimized Framework for Matrix Factorization on the New Sunway Many-core Platform

نویسندگان

چکیده

Matrix factorization functions are used in many areas and often play an important role the overall performance of applications. In LAPACK library, matrix implemented with blocked algorithm, shifting most workload to high-performance Level-3 BLAS functions. But non-blocked part, panel factorization, becomes bottleneck, especially for small- medium-size matrices that common cases real On new Sunway many-core platform, bottleneck can be alleviated by keeping LDM factorization. Therefore, we propose a framework implementing on facilitating in-LDM The provides template class wrapper functions, which integrates inter-CPE communication Level-1 Level-2 flexible interfaces accommodate different partitioning schemes. With framework, writing code data residing space done much higher productivity. We three ( dgetrf , dgeqrf dpotrf ) based compared our work CPE_BLAS version, uses original implementation linked optimized library runs CPE mesh. Using favorable partitioning, part achieves speedup up 26.3, 19.1, 18.2 For whole function, is carefully tuned recursion added specific optimization some subroutines Overall, obtained average 9.76 10.12 4.16 version. Based current class, extended support more categories linear algebra

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimized Dense Matrix Multiplication on a Many-Core Architecture

Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), belong to a new set of manycore-on-a-chip systems with a software managed memory hierarchy. New programming and compiling methodologies are required to fully exploit the potential of this new class of architectures. In this pape...

متن کامل

Matrix Multiplication Parallelization on a Many-Core Platform

This paper introduces an approach to analyze the power and energy consumption of a many-core system. The investigation has been done by using the Intel SCC system as an experimental platform. The approach is to collect the time and power profiling of an executing application on the Intel SCC system. And then, we find the total energy consumed for the entire execution. We studied the effects of ...

متن کامل

Accelerating Non-Negative Matrix Factorization for Audio Source Separation on Multi-Core and Many-Core Architectures

Non-negative matrix factorization (NMF) has been successfully used in audio source separation and parts-based analysis; however, iterative NMF algorithms are computationally intensive, and therefore, time to convergence is very slow on typical personal computers. In this paper, we describe high performance parallel implementations of NMF developed using OpenMP for shared-memory multicore system...

متن کامل

Pipelining Computation and Optimization Strategies for Scaling GROMACS on the Sunway Many-Core Processor

The increasing gap between plentiful computing elements and limited memory bandwidth makes it increasingly difficult and sometimes even infeasible for HPC community to port more applications onto many-core processor archi‐ tectures. The Sunway many-core processor SW26010 used to build the Sunway TaihuLight System contains a total of 260 heterogeneous cores. All these cores can be divided into 4...

متن کامل

an investigation about the appropriate stochastic modeling framework for agricultural insurance pricing

با توجه به اینکه بیمه محصولات کشاورزی در ایران بیشتر جنبه ای حمایتی دارد و خسارات گزارش شده عموما بیش از حق بیمه های دریافت شده است، در این پایان نامه به جهت تعیین قیمت بیمه محصولات کشاورزی (گندم دیم) از فرآیندهای نوفه شلیک به عنوان مدلی مناسب استفاده شده است. بر اساس داده های صندوق بیمه کشاورزی از خسارات اعلام شده در سال زراعی 1388-1389 گندم دیم، در این پایان نامه حق بیمه خالص و ناخالص این محص...

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Architecture and Code Optimization

سال: 2023

ISSN: ['1544-3973', '1544-3566']

DOI: https://doi.org/10.1145/3571856